Cross-Genre Author Profile Prediction Using Stylometry-Based Approach

نویسندگان

  • Shaina Ashraf
  • Hafiz Rizwan Iqbal
  • Rao Muhammad Adeel Nawab
چکیده

Author profiling task aims to identify different traits of an author by analyzing his/her written text. This study presents a Stylometry-based approach for detection of author traits (gender and age) for cross-genre author profiles. In our proposed approach, we used different types of stylistic features including 7 lexical features, 16 syntactic features, 26 character-based features and 6 vocabulary richness (total 56 stylistic features). On the training corpus, the proposed approach obtained promising results with an accuracy of 0.787 for gender, 0.983 for age and 0.780 for both (jointly detecting age and gender). On the test corpus, proposed system gave an accuracy of 0.576 for gender, 0.371 for age and 0.256 for both.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016

Author profiling deals with the study of various profile dimensions of an author such as age and gender. This work describes our methodology proposed for the task of cross-genre author profiling at PAN 2016. We address gender and age prediction as a classification task and approach this problem by extracting stylistic and lexical features for training a logistic regression model. Furthermore, w...

متن کامل

A Machine Learning-based Intrinsic Method for Cross-topic and Cross-genre Authorship Verification

This paper presents our approach for the Author Identification task in the PAN CLEF Challenge 2015. We identified the challenges of this year’s are the limited amount of training data and the problems in the sub-corpora are independent in terms of topic and genre. We adopted a machine learning based intrinsic method to verify whether a pair of documents have been written by same or different au...

متن کامل

Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre persp...

متن کامل

Profile-based Approach for Age and Gender Identification

This paper describes the participation between the LIDIC research group of the UNSL from Argentina and the Language and Reasoning research group of the UAM Cuajimalpa from Mexico at the PAN’s 2016 Author Profiling task. For the proposed method we adopted a profile-based approach, which has been successfully applied in the Authorship Attribution problem. Thus, we proposed a variation of this tec...

متن کامل

Deep Level Lexical Features for Cross-lingual Authorship Attribution

Crosslingual document classification aims to classify documents written in different languages that share a common genre, topic or author. Knowledge-based methods and others based on machine translation deliver state-of-the-art classification accuracy, however because of their reliance on external resources, poorly resourced languages present a challenge for these type of methods. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016